The expeditions/ dataset anchors the entire spatial structure of the Pristine Seas Science Database. It provides a canonical registry of expeditions, spatial hierarchy, and field deployments across all survey methods. These tables enable consistent joins, filtering, and spatial aggregation across datasets.
1 Tables
1.1expeditions.info
This table stores metadata for each expedition conducted by the Pristine Seas team. Each row corresponds to a unique expedition and defines the top-level context for all associated sampling efforts.
Table 1: expeditions.info Table Schema
Field
Type
Required
Description
exp_id
STRING
true
Unique expedition identifier in the format ISO3_YEAR (e.g., FJI_2025)
number
INTEGER
true
Sequential expedition number (e.g., 43)
name
STRING
true
Official or working expedition name
country
STRING
true
Primary country or jurisdiction visited
start_date
DATE
true
Expedition start date in ISO 8601 format (YYYY-MM-DD)
end_date
DATE
true
Expedition end date in ISO 8601 format (YYYY-MM-DD)
lead
STRING
true
Overall expedition lead
science_lead
STRING
true
Lead scientist for the research campaign
vessel
STRING
false
Name of the vessel or platform used
ship_track
STRING
false
WKT LINESTRING representing the ship’s route (optional)
partners
STRING
false
Institutional collaborators (optional)
description
STRING
false
Brief description of expedition goals and scope
notes
STRING
false
Optional free-text notes or admin metadata
To date, our team has conducted 47 expeditions across 30 countries since our first one in 2007.
Figure 1: Timeline of expeditions
1.2expeditions.sites
This table defines the primary spatial unit of deployment for each survey method. A site represents a discrete application of a method (e.g., UVS, pBRUV, submersible) at a specific location and time during an expedition. Each method maintains its own internal sites table, but this shared expeditions.sites table serves as the canonical registry for cross-method integration, mapping, and analysis.
What is a site?
A site is defined by a unique combination of exp_id, method, and a 3-digit site number (e.g., FJI_2025_uvs_001)
It represents one deployment, diver, or survey event of a given method at a location
Each site may contain multiple stations (e.g., depth strata, replicates, rigs)
Examples:
A UVS site might be a reef where divers conduct fish, benthic, and invertebrate surveys at different depths
A pBRUV site could be a pelagic deployment of five camera rigs
A submersible site is typically a single dive with multiple horizontal transects
Sites are the parent unit of stations and the core join key for spatial summaries, mapping, and integration across methods.
All methods share the following core site schema (Table 2)
Table 2: expeditions.sites Table Schema
Field
Type
Required
Description
ps_site_id
STRING
true
Unique site ID (exp_id_method_###), e.g., FJI_2025_uvs_001
exp_id
STRING
true
Foreign key to expeditions.info
method
STRING
true
Field method used (e.g., uvs, pbruv, edna, sub)
region
STRING
true
Broad geographic or administrative unit (e.g., Murat, Chocó, Tuamotu, Duff Islands).
subregion
STRING
true
Intermediate feature within the region such as an atoll, island, gulf, or reef complex
locality
STRING
false
Local named feature such as a village, bay, cove, reef (e.g., Lolieng, Ensenada Utría).
date
DATE
true
Date of the site-level deployment in ISO 8601 format (YYYY-MM-DD)
time
TIME
true
Local time of the deployment in 24-hour format (HH:MM:SS)
latitude
FLOAT
true
Approximate latitude (decimal degrees, WGS84)
longitude
FLOAT
true
Approximate longitude (decimal degrees, WGS84)
lead
STRING
true
Name of the lead scientist or survey team lead
notes
STRING
false
Optional comments, metadata, or field observations
---title: "Expeditions Dataset"format: html: theme: lux self-contained: true code-fold: true toc: true toc-depth: 3 toc-location: right number-sections: true number-depth: 3---```{r setup, message = F, warning = F, fig.width = 10, fig.height = 10, echo = F}options(scipen = 999)library(PristineSeasR)library(bigrquery)library(gt)library(gtExtras)library(tibble)library(tidyverse)library(dplyr)library(ggplot2)library(plotly)library(lubridate)knitr::opts_chunk$set(eval = F, warning = F, message = F, include = F, echo = F)ps_paths <- PristineSeasR::get_sci_drive_paths()prj_path <- file.path(ps_paths$projects, "legacy-db")ps_data_path <- ps_paths$datasetsbigrquery::bq_auth(email = "marine.data.science@ngs.org")project_id <- "pristine-seas"bq_connection <- DBI::dbConnect(bigrquery::bigquery(), project = project_id)```The `expeditions/` dataset anchors the entire spatial structure of the Pristine Seas Science Database. It provides a canonical registry of expeditions, spatial hierarchy, and field deployments across all survey methods. These tables enable consistent joins, filtering, and spatial aggregation across datasets.## Tables### `expeditions.info`This table stores metadata for each expedition conducted by the Pristine Seas team. Each row corresponds to a unique expedition and defines the top-level context for all associated sampling efforts.```{r eval = T, include = T}#| label: tbl-exp.info#| tbl-cap: "expeditions.info Table Schema"exp.info_fields <- tribble( ~field, ~type, ~required, ~description, "exp_id", "STRING", TRUE, "Unique expedition identifier in the format ISO3_YEAR (e.g., FJI_2025)", "number", "INTEGER", TRUE, "Sequential expedition number (e.g., 43)", "name", "STRING", TRUE, "Official or working expedition name", "country", "STRING", TRUE, "Primary country or jurisdiction visited", "start_date", "DATE", TRUE, "Expedition start date in ISO 8601 format (YYYY-MM-DD)", "end_date", "DATE", TRUE, "Expedition end date in ISO 8601 format (YYYY-MM-DD)", "lead", "STRING", TRUE, "Overall expedition lead", "science_lead", "STRING", TRUE, "Lead scientist for the research campaign", "vessel", "STRING", FALSE, "Name of the vessel or platform used", "ship_track", "STRING", FALSE, "`WKT LINESTRING` representing the ship's route (optional)", "partners", "STRING", FALSE, "Institutional collaborators (optional)", "description", "STRING", FALSE, "Brief description of expedition goals and scope", "notes", "STRING", FALSE, "Optional free-text notes or admin metadata")exp.info_fields |> gt() |> cols_label(field = md("**Field**"), type = md("**Type**"), required = md("**Required**"), description = md("**Description**")) |> cols_width(field ~ px(200), type ~ px(100), required ~ px(80), description ~ px(500)) |> data_color(columns = c(field), fn = scales::col_factor(palette = c("#f6f6f6"), domain = NULL) ) |> tab_options(table.font.size = px(13), table.width = pct(100)) |> fmt_tf(columns = required, tf_style = "true-false") |> fmt_markdown(columns = description) |> gt_theme_nytimes()``````{r eval = T, include = T}# Read and clean the raw dataexp_info_url <- "https://docs.google.com/spreadsheets/d/1YVjkTWcxB1UseknmGvGmayeKP2fH_9UIbubksmkSGMU/edit#gid=0"expeditions_data <- googlesheets4::read_sheet(exp_info_url)expeditions_clean <- expeditions_data %>% janitor::clean_names() %>% separate(date, into = c("start_date", "end_date"), sep = " - ") %>% rename(exp_id = expedition_id, number = expedition_number, name = expedition_name, lead = expedition_leader, science_lead = science_lead, country = country, vessel = ship_name, partners = partners) %>% mutate(start_date = ymd(start_date), end_date = ymd(end_date), ship_track = NA_character_, description = NA_character_, notes = NA_character_) %>% select(exp_id, number, name, country, start_date, end_date, lead, science_lead, vessel, ship_track, partners, description, notes)```To date, our team has conducted **`r nrow(expeditions_clean)`** expeditions across **`r length(unique(expeditions_clean$country))`** countries since our first one in 2007. ```{r eval = T, include = T}#| label: fig-expeditions-timeline#| fig-cap: "Timeline of expeditions"#| fig-width: 10#| fig-height: 8#| expeditions_timeline <- expeditions_clean %>% mutate(year = year(start_date)) %>% filter(!is.na(year)) %>% arrange(year) %>% group_by(year) %>% summarise(n = n(), expedition_list = paste(name, collapse = "<br>"), .groups = "drop") %>% mutate(cumulative = cumsum(n))# Elegant cumulative line plotp <- ggplot(expeditions_timeline, aes(x = year, y = cumulative, group = 1, text = paste0("<b>Year:</b> ", year, "<br>", "<b>Cumulative:</b> ", cumulative, "<br>", "<b>Expeditions:</b><br>", expedition_list))) + geom_line(color = "#1F77B4", linewidth = 1.3, lineend = "round") + geom_point(color = "#1F77B4", size = 3) + scale_x_continuous(breaks = unique(expeditions_timeline$year)) + labs(title = "Pristine Seas Expeditions", subtitle = "Number of expeditions through time", x = NULL, y = NULL) + theme_minimal(base_family = "Inter") + theme(plot.title = element_text(size = 18, face = "bold"), axis.text = element_text(size = 12), panel.grid.major = element_line(color = "#eaeaea"), panel.grid.minor = element_blank(), axis.line = element_blank())# Convert to interactive plotlyggplotly(p, tooltip = "text") %>% layout(hoverlabel = list(bgcolor = "#ffffff", font = list(family = "Inter", size = 13)), margin = list(t = 60, b = 40, l = 30, r = 30), hovermode = "x unified")``````{r create-dataset, include = FALSE}if (!bq_dataset_exists(bq_dataset("pristine-seas", "expeditions"))) { bq_dataset_create( bq_dataset("pristine-seas", "expeditions"), description = "Core reference dataset for the Pristine Seas Science Database. Contains metadata and spatial hierarchy for all expeditions, including expedition info, sites, and stations. Provides the backbone for spatial joins, standardization, and integration across all survey methods." )}# Define table schemaexp.info_schema <- exp.info_fields |> mutate(mode = if_else(required, "REQUIRED", "NULLABLE")) |> transmute(name = field, type = type, mode = mode, description = description) |> pmap(function(name, type, mode, description) { list(name = name, type = type, mode = mode, description = description) })bq_table_create(bq_table(project_id, "expeditions", "info"), fields = exp.info_schema)bq_table_upload(bq_table("pristine-seas", "expeditions", "info"), values = expeditions_clean, create_disposition = "CREATE_NEVER", write_disposition = "WRITE_APPEND")```### `expeditions.sites`This table defines the primary spatial unit of deployment for each survey method. A site represents a discrete application of a method (e.g., UVS, pBRUV, submersible) at a specific location and time during an expedition. Each method maintains its own internal sites table, but this shared `expeditions.sites` table serves as the canonical registry for cross-method integration, mapping, and analysis.::: {.callout-tip title="What is a site?"}A site is defined by a unique combination of `exp_id`, `method`, and a 3-digit site number _(e.g., `FJI_2025_uvs_001`)_- It represents one **deployment, diver, or survey event** of a given method at a location - Each site may contain **multiple stations** (e.g., depth strata, replicates, rigs)**Examples:** - A **UVS site** might be a reef where divers conduct fish, benthic, and invertebrate surveys at different depths - A **pBRUV site** could be a pelagic deployment of five camera rigs - A **submersible site** is typically a single dive with multiple horizontal transectsSites are the **parent unit** of stations and the core join key for spatial summaries, mapping, and integration across methods.:::All methods share the following core site schema (@tbl-exp.sites)```{r eval = T, include = T}#| label: tbl-exp.sites#| tbl-cap: "expeditions.sites Table Schema"exp.sites_fields <- tribble( ~field, ~type, ~required, ~description, "ps_site_id", "STRING", TRUE, "Unique site ID (`exp_id_method_###`), e.g., `FJI_2025_uvs_001`", "exp_id", "STRING", TRUE, "Foreign key to `expeditions.info`", "method", "STRING", TRUE, "Field method used (e.g., `uvs`, `pbruv`, `edna`, `sub`)", "region", "STRING", TRUE, "Broad geographic or administrative unit (e.g., Murat, Chocó, Tuamotu, Duff Islands).", "subregion", "STRING", TRUE, "Intermediate feature within the region such as an atoll, island, gulf, or reef complex", "locality", "STRING", FALSE, "Local named feature such as a village, bay, cove, reef (e.g., Lolieng, Ensenada Utría).", "date", "DATE", TRUE, "Date of the site-level deployment in ISO 8601 format (`YYYY-MM-DD`)", "time", "TIME", TRUE, "Local time of the deployment in 24-hour format (`HH:MM:SS`)", "latitude", "FLOAT", TRUE, "Approximate latitude (decimal degrees, WGS84)", "longitude", "FLOAT", TRUE, "Approximate longitude (decimal degrees, WGS84)", "lead", "STRING", TRUE, "Name of the lead scientist or survey team lead", "notes", "STRING", FALSE, "Optional comments, metadata, or field observations")exp.sites_fields |> gt() |> cols_label(field = md("**Field**"), type = md("**Type**"), required = md("**Required**"), description = md("**Description**")) |> cols_width(field ~ px(200), type ~ px(100), required ~ px(80), description ~ px(500)) |> data_color(columns = c(field), fn = scales::col_factor(palette = c("#f6f6f6"), domain = NULL) ) |> tab_options(table.font.size = px(13), table.width = pct(100)) |> fmt_tf(columns = required, tf_style = "true-false") |> fmt_markdown(columns = description) |> gt_theme_nytimes()```